Foreign language learning apparatus, foreign language learning method, and medium

ABSTRACT

A speech recognition unit ( 114 ) and a processor unit ( 116 ) of a foreign language learning device ( 100 ) receive sentence speech information corresponding to a sentence pronounced by a learner ( 2 ) to separate the information into word speech information on the basis of words included in the sentence. The processor unit ( 116 ) evaluates the degree of matching (likelihood) of each word speech information with a model speech, and a resultant evaluation is indicated on a display unit ( 120 ) on the basis of each word.

TECHNICAL FIELD

[0001] The present invention relates to a device and a method forlearning foreign languages by means of a speech recognition system andto a computer-readable medium recorded thereon a program for executingsuch a foreign language learning method by a computer.

BACKGROUND ART

[0002] In recent years, considerable attempts have been made to applyspeech recognition systems to learning of foreign languages.Specifically, a learner uses a foreign language learning device to readout one or a plurality of sentences in a foreign language so that thepronounced sentence(s) is input to a personal computer (computingmachine) through its voice input function. A speech recognition systemincorporated in the personal computer adapted to that foreign languageevaluates to what degree the sentence(s) read out by the learner canaccurately be recognized and then a resultant rating is displayed as afeedback to the learner.

[0003] However, the speech recognition system used by the conventionalforeign language learning device is originally devised with theobjective of replacing keyboard input to the personal computer withvoice input. Accordingly, sentences pronounced by the learner arerecognized on the basis of one sentence and the recognized sentence andan original sentence are compared to output the result of comparison.Therefore, the learner can merely know a rating for the sentenceevaluated as a whole.

[0004] In actual, it rarely occurs that the rating is the same for theentire sentence. Generally, a higher rating is achieved for a specificpart of the sentence while a lower rating is given for another part.

[0005] Then, the learner cannot know, from the rating of the wholesentence, which part of the sentence is low in terms of the rating forpronunciation by the learner, particularly when the learner receives alow rating. Consequently, the learner repeatedly pronounces the entiresentence again and again until the rating rises, resulting in a problemthat the learning efficiency is impaired.

DISCLOSURE OF THE INVENTION

[0006] One object of the present invention is to provide a foreignlanguage learning device capable of presenting a rating forpronunciation of a sentence in a foreign language pronounced by alearner so as to enable the learner to efficiently practice thepronunciation of the foreign language.

[0007] Another object of the present invention is to provide a foreignlanguage learning method by which a rating for pronunciation of asentence in a foreign language pronounced by a learner can efficientlybe fed back to the learner practicing the pronunciation of the foreignlanguage.

[0008] Still another object of the invention is to provide acomputer-readable medium recorded thereon a program for executing, by acomputer, a foreign language learning method by which a rating forpronunciation of a sentence in a foreign language pronounced by alearner can efficiently be fed back to the learner practicing thepronunciation of the foreign language.

[0009] A foreign language learning device according to the presentinvention includes, for the purpose of achieving those objects, wordseparation means, likelihood determination means and display means. Theword separation means receives sentence speech information correspondingto a sentence pronounced by a learner to separate the sentence speechinformation into word speech information on the basis of each wordincluded in the sentence. The likelihood determination means evaluatesdegree of matching of each word speech information with a model speech.The display means displays, for each word, a resultant evaluationdetermined by the likelihood determination means.

[0010] Preferably, the foreign language learning device further includesstorage means and output means. The storage means stores a modelsentence to be pronounced by the learner and model phoneme arrayinformation corresponding to the model sentence. The output meanspresents the model sentence to the learner in advance. The wordseparation means includes phoneme recognition means and word speechrecognition means. The phoneme recognition means recognizes the sentencespeech information on the basis of each phoneme information. The wordspeech recognition means recognizes the word speech information for eachword according to the phoneme information and the model phoneme arrayinformation after the separation.

[0011] According to another aspect of the invention, a foreign languagelearning method includes the steps of receiving sentence speechinformation corresponding to a sentence pronounced by a learner andaccordingly separating the sentence speech information into word speechinformation on the basis of each word included in the sentence,evaluating degree of matching of each word speech information with amodel speech, and displaying, for each word, a resultant evaluation ofeach word speech information.

[0012] Preferably, the foreign language learning method further includesthe step of presenting a model sentence to the learner in advance. Thestep of separating the sentence speech information into the word speechinformation includes the steps of recognizing the sentence speechinformation on the basis of each phoneme information, and recognizingthe word speech information for each word according to model phonemearray information corresponding to the model sentence presented to thelearner and the phoneme information after the separation.

[0013] According to still another aspect of the invention, a foreignlanguage learning device includes storage means, output means, wordseparation means, likelihood determination means, display means, andpronunciation evaluation means. The storage means stores a modelsentence to be pronounced by a learner and model phoneme arrayinformation corresponding to the model sentence. Output means presentsthe model sentence to the learner in advance. The word separation meansreceives sentence speech information corresponding to a sentencepronounced by the learner to separate the sentence speech informationinto word speech information on the basis of each word included in thesentence. The likelihood determination means evaluates degree ofmatching of each word speech information with a model speech. Thedisplay means displays, for each phoneme and each word, a resultantevaluation by the likelihood determination means. The pronunciationevaluation means evaluates a resultant pronunciation after practice ofthe pronunciation for each phoneme and for each word in the modelsentence uttered by the learner in a pronunciation practice period. Theword separation means includes phoneme recognition means and word speechrecognition means. The phoneme recognition means recognizes the sentencespeech information on the basis of each phoneme information. The wordspeech recognition means recognizes the word speech information for eachword according to the phoneme information and model phoneme arrayinformation after the separation.

[0014] According to a further aspect of the invention, acomputer-readable medium recorded thereon a program for executing aforeign language learning method by a computer. The foreign languagelearning method includes the steps of receiving sentence speechinformation corresponding to a sentence pronounced by a learner andaccordingly separating the sentence speech information into word speechinformation on the basis of each word included in the sentence,evaluating degree of matching of each word speech information with amodel speech, and displaying, for each word, a resultant evaluation ofeach word speech information.

[0015] Accordingly, by the foreign language learning device or theforeign language learning method, a rating is shown for each word in asentence pronounced by the learner. Then, the resultant rating for thepronunciation of the sentence in a foreign language uttered by thelearner can efficiently be fed back to the learner practicing thepronunciation of the foreign language.

BRIEF DESCRIPTION OF THE DRAWINGS

[0016]FIG. 1 is a schematic block diagram illustrating a structure of aforeign language learning device 100 according to the present invention.

[0017]FIG. 2 is a conceptual representation illustrating a structure ofsentence speech information on one of model sentences.

[0018]FIG. 3 is a flowchart illustrating a flow of foreign languagelearning implemented by the foreign language learning device 100 shownin FIG. 1.

[0019]FIG. 4 is a conceptual representation illustrating an operation ofa speech recognition unit 114.

[0020]FIG. 5 is a conceptual representation showing a method ofextracting phoneme speech information from speech information regardinga recorded sentence according to likelihoods on the basis of eachsegment.

[0021]FIG. 6 is a conceptual representation showing a procedure fordetermining the likelihood for each phoneme of recorded speech as wellas the likelihood for a word of the recorded speech.

[0022]FIG. 7 shows a path through which phonemes make transition withtime when pronunciation is exactly the same as that of a model sentenceand shows a procedure for determining likelihoods for evaluation ofpronunciation.

[0023]FIG. 8 is a schematic block diagram illustrating a structure of aforeign language learning device 200 according to a second embodiment.

[0024]FIG. 9 is a flowchart illustrating a foreign language learningprocess by the foreign language learning device 200 shown in FIG. 8.

[0025]FIG. 10 is a flowchart showing, in more detail, a process followedin the steps of calculating and displaying a rating for each word andpracticing pronunciation word by word and phoneme by phoneme.

[0026]FIG. 11 is a flowchart illustrating a process for preliminarilyperforming a learning process with respect to a Hidden Markov Model forspeech recognition.

[0027]FIG. 12 is a flowchart illustrating a process flow for calculatinga rating for each phoneme in each word.

[0028]FIG. 13 is a first representation showing a shape of a vocal tractwhen “L” is pronounced.

[0029]FIG. 14 is a second representation showing a shape of the vocaltract when “L” is pronounced.

[0030]FIG. 15 is a first representation showing a shape of the vocaltract when “R” is pronounced.

[0031]FIG. 16 is a second representation showing a shape of the vocaltract when “R” is pronounced.

[0032]FIG. 17 shows a change in resonance frequency pattern with time,presented as information to a learner practicing phoneme pronunciation.

[0033]FIG. 18 shows a display screen indicating a formant positionpresented as another information to the learner practicing phonemepronunciation.

BEST MODES FOR CARRYING OUT THE INVENTION

[0034] Embodiments of the present invention are now described inconjunction with the drawings.

[0035] [First Embodiment]

[0036]FIG. 1 is a schematic block diagram illustrating a structure of aforeign language learning device 100 according to the present invention.

[0037] Although English language is herein used to describe a foreignlanguage, use of the present invention is not limited to English and isapplicable generally to any language to be learned by a learner that isnot the native language of the learner, which will become clear from thefollowing description.

[0038] Referring to FIG. 1, foreign language learning device 100includes a microphone 102 for acquiring voice produced by a learner 2, amicrocomputer 110 receiving an output of microphone 102 for processingvoice information corresponding to a sentence pronounced by learner 2 todetermine a rating for pronunciation by the learner for each wordincluded in that sentence in accordance with an expected pronunciation,and a display unit (display) 120 for presenting an original sentence tobe pronounced by learner 2 that is supplied from microcomputer 110 anddisplaying a rating for the learner's pronunciation of each word, therating determined word by word.

[0039] The original sentence to be pronounced by learner 2 hereinafterreferred to as model sentence) may be presented as character informationon display unit 120 to learner 2 or as sound from a loudspeaker 104 tolearner 2. For practice of pronunciation of each word described below, amodel pronunciation can be output as sound from loudspeaker 104.

[0040] Microcomputer 110 includes a speech input/output unit 112 servingas an interface for receiving a speech signal from microphone 102 andproviding a speech signal to loudspeaker 104, a speech recognition unit114 analyzing and separating, according to a signal from speechinput/output unit 112, speech information corresponding to a sentencesupplied to microphone 102 (hereinafter referred to as “sentence speechinformation”) into phoneme information included in the sentence speechinformation as described below, a data storage unit 118 for temporarilystoring the sentence speech information and holding the model sentenceand phoneme information corresponding to the model sentence as well asinformation about word boundary, and a processor unit 116 determining,according to the result of separation by speech recognition unit 114 andthe information about the model sentence which is held in data storageunit 118 and is provided to learner 2 for inducing the learner topronounce the sentence, a rating for pronunciation by learner 2 on thebasis of each word included in the model sentence, the rating determinedrelative to the phoneme information about the model sentence (modelphoneme information).

[0041] [Structure of Sentence Speech Information]

[0042]FIG. 2 is a conceptual representation illustrating a structure ofsentence speech information about one of model sentences.

[0043] The example shown in FIG. 2 is a model sentence “I have a redpen.”

[0044] The speech language has hierarchy as shown in FIG. 2. A sentenceis segmented into words, then syllables (syllable is a unit consistingof consonant and vowel that is usually represented by one kana characterin Japanese) and further into phonemes (single consonant, single vowel).

[0045] The process of segmenting one sentence is somewhat differentbetween languages. For some languages, so-called “phrases” may be formedas an intermediate layer between the sentence and words.

[0046]FIG. 3 is a flowchart illustrating a flow of foreign languagelearning implemented by foreign language learning device 100 shown inFIG. 1.

[0047] As clearly understood from FIG. 3, through the foreign languagelearning by means of foreign language learning device 100, the hierarchyof speech language can be utilized to make a general evaluation ofpronunciation of each sentence read out by a learner as well as anevaluation of pronunciation of each word and even each phoneme andaccordingly feed back rating for the pronunciation to the learner. Then,the learner can practice, according to the given rating, pronunciationof each word or phoneme for which a low rating is given. In particular,since a rating for each word is displayed, an influence of measurementerrors is reduced for respective phonemes and the learner can practicepronunciation word by word, the word-by-word pronunciation practicebeing easy for the learner, and thus an efficient pronunciation practiceis possible.

[0048] Referring to FIG. 3, foreign language learning is started (stepS100), and a model sentence to be pronounced is presented by displayunit 120 to learner 2 (step S102).

[0049] Learner 2 pronounces the model sentence and accordingly speechinformation corresponding to the model sentence (sentence speechinformation) is acquired via microphone 102 and speech input/output unit112 (step S104).

[0050] Speech recognition unit 114 recognizes, according to a signalprovided from speech input/output unit 112, the sentence speechinformation as speech information on the basis of a phoneme (step S106).

[0051] Processor unit 116 compares the speech information of phonemesseparated by speech recognition unit 114 with model phoneme informationfor the model sentence that is stored in data storage unit 118 torecognize the speech information on the basis of each word (step S108).

[0052] Then, for each word in the sentence speech information, processorunit 116 refers to the model phoneme information for the model sentencestored in data storage unit 118 to determine a rating for pronunciationof each word and outputs the rating onto display unit 120 (step S110).At this time, a rating for each phoneme included in each word may beoutput together with the rating for the word.

[0053] Learner 2 then practices, according to the rating on the basis ofeach word or each phoneme, pronunciation word by word or phoneme byphoneme which the learner cannot pronounce appropriately (step S112).

[0054] When it is determined that the pronunciation practice iscompleted, an instruction is given regarding whether or notpronunciation of the model sentence will be retried by learner 2 throughan input device (keyboard or speech input unit) of personal computer 110(step S114). When an instruction is given that retry should be made, theprocess returns to step S104. Otherwise, the process proceeds to thenext step S116.

[0055] Then, an instruction is given that pronunciation practice ofanother model sentence should be tried by learner 2 via the input deviceof personal computer 110 (step S116). When the instruction thatpronunciation practice should be done is given, the process returns tostep S102. Otherwise, the process is completed (step S120).

[0056] [Method of Determining Rating for each Word]

[0057] A method of determining a rating for pronunciation of each wordis detailed below.

[0058]FIG. 4 is a conceptual representation illustrating an operation ofspeech recognition unit 114.

[0059] A waveform of speech uttered by learner 2 is stored temporarilyin data storage unit 118 and thus recorded. Speech recognition unit 114divides the recorded speech waveform into segments of a certain lengthsuch as segment A, segment B, segment C and the like to determinelikelihoods of phonemes for each segment. The likelihoods for eachsegment are determined such that respective likelihoods for all phonemessampled in advance are evaluated, all the phonemes being all of possiblephonemes which appear in English pronunciation. In other words,respective likelihoods of all English phonemes are determined for eachsegment.

[0060] Specifically, speech recognition unit 114 compares a model set ofacoustic feature vectors of respective phonemes produced in advance fromspeech samples of a plurality of speakers with a set of acoustic featurevectors for a specific segment of the recorded speech to determinelikelihoods for each segment by means of the well-known maximumlikelihood estimation.

[0061] This maximum likelihood estimation is disclosed for example in adocument “Probability, Random Variables, and Stochastic Processes (ThirdEdition)”, Ed. Athanasios Papoulis, McGraw-Hill. Inc. New York, Tokyo(1991).

[0062]FIG. 5 shows a distribution of likelihoods with the longitudinalaxis indicating phonemes which can be appear in English language and thehorizontal axis indicating those for each segment. On this plane oflikelihood distribution, an optimum path of phonemes is selected thatcorresponds to a result of speech recognition.

[0063] The class of an optimum phoneme (with maximum likelihood) makestransition with time and accordingly it is determined that a transitionto the next phoneme is made and the boundary of phonemes is recognized.

[0064] In FIG. 5, the bold line represents a path through which such anoptimum phoneme passes with time among path candidates for mistakenlyutterable phoneme sequences.

[0065]FIG. 6 is a conceptual representation showing a procedure fordetermining, by processor unit 116, a likelihood of each phoneme of therecorded speech and a likelihood of a word according to thus determinedphoneme speech information for each segment of the recorded speech.

[0066] Specifically, processor unit 116 calculates the average oflikelihoods for each phoneme recognized from the recorded speech todetermine the likelihood of each phoneme.

[0067] Processor unit 116 further determines the likelihood of each wordby calculating the sum or average of phoneme likelihoods for each wordaccording to respective likelihoods of phonemes along the path as shownin FIG. 5 among the mistakenly utterable candidate sequences determinedfrom the recorded speech waveform.

[0068] More specifically, when content-descriptive information, forexample, a model sentence “I have a red pen” is given in advance,processor unit 116 determines the likelihood of each word (hereinafter“word likelihood”) by calculating the sum or average of respectivelikelihoods of phonemes included in each word according to informationabout phonetic notation of the model sentence, namely /ai : h ae v : a :red : pen/ and to information about the boundary of words (“:” includedin the phonetic notation) along the path among mistakenly utterablecandidate sequences. The information about the array of phonemes of themodel sentence and the information about word boundary are hereinafterreferred to as “model phoneme array information” as a whole.

[0069]FIG. 7 illustrates a procedure for determining, on the likelihooddistribution plane shown in FIG. 5, a path through which phonemes changewith time when the model sentence is pronounced exactly as it is andlikelihoods for evaluating the pronunciation.

[0070] Referring to FIG. 7, according to the content-descriptiveinformation given in advance, processor unit 116 determines wordlikelihood by calculating the sum or average of phoneme likelihoods ofphonemes included in each word, along the path corresponding to thephoneme array when the model sentence with the content-descriptiveinformation is exactly pronounced, through the procedure as describedabove in conjunction with FIGS. 5 and 6.

[0071] Then, processor unit 116 compares each word likelihood determinedas described above along the path corresponding to the phoneme arrayexactly the same as the content-descriptive information (phonetic arrayas per the model phoneme array information) with each word likelihoodalong a mistakenly utterable candidate path for each word determinedfrom the recorded speech waveform, and accordingly determines a ratingfrom the relative relation therebetween.

[0072] It is assumed for example that each word likelihood determinedalong the path corresponding to the phoneme array exactly the same asthe content-descriptive information is referred to as “word likelihoodof ideal path” and the sum of word likelihoods determined along themistakable path from the recorded speech waveform is referred to “wordlikelihood of mistakenly utterable candidate path”, a rating for eachword can be determined as shown below. The procedure is not limited tothe particular one as described here.

(word rating)=(word likelihood of ideal path)/(word likelihood of idealpath+word likelihood of mistakenly utterable candidate path)×100

[0073] The rating for each word can be determined and displayed for asentence pronounced by a learner through the procedure as describedabove.

[0074] It is assumed for example that each phoneme likelihood determinedalong the path corresponding to the phoneme array exactly the same asthe content-descriptive information is referred to as “phonemelikelihood of ideal path” and the sum of phoneme likelihoods determinedalong the mistakenly utterable candidate path from the recorded speechwaveform is referred to “phoneme likelihood of mistakenly utterablecandidate path”, and then a rating for each phoneme can also bydetermined as follows. This procedure is not limited to the particularone described here.

(phoneme rating)=(phoneme likelihood of ideal path)/(phoneme

likelihood of ideal path+phoneme likelihood of mistakenly utterable

candidate path)×100

[0075] In this way, in addition to the rating for each word of asentence pronounced by a learner, a rating for each phoneme included inthe word can be displayed.

[0076] The description above of the present invention is applied to astructure for acquiring speech information for each word by segmentingsentence speech information into phoneme information. However, thestructure may be accomplished by directly separating the sentence speechinformation into speech information for each word.

[0077] [Second Embodiment]

[0078] The first embodiment is described for the structure of theforeign language learning device which recognizes a sentence in aforeign language read out by a learner to display a rating for each wordor each phoneme and accordingly enhance the learning efficiency.

[0079] Regarding a second embodiment, a description is given for astructure of a foreign language learning device and a foreign languagelearning method by which a learner can efficiently practicepronunciation according to the rating for each word (or each phoneme) asdescribed above.

[0080]FIG. 8 is a schematic block diagram illustrating a structure of aforeign language learning device 200 according to the second embodiment.

[0081] Foreign language learning device 200 has its structure basicallythe same as that of foreign language learning device 100 according tothe first embodiment.

[0082] Specifically, referring to FIG. 8, foreign language learningdevice 200 includes a speech input unit 102 (e.g. microphone) foracquiring speech produced by a learner, an MPU 116 receiving an outputof speech input unit 102 for processing speech information correspondingto a sentence pronounced by the learner to determine a rating forpronunciation by the learner for each word included in that sentence inaccordance with an expected pronunciation, a CRT display 120 forpresenting an original sentence to be pronounced by the learner that issupplied from MPU 116 and displaying a rating for the learner'spronunciation of each word, the rating determined word by word, and akeyboard mouse 122 for receiving data input to foreign language learningdevice 200 by the learner.

[0083] Foreign language learning device 200 further includes a learningcontrol unit 101 for controlling the entire operation of the foreignlanguage learning device, a speech recognition unit 114 controlled bylearning control unit 101 for performing a speech recognition process onsentence information supplied from the speech input unit, and a datastorage unit 118 controlled by learning control unit 101 for storingdata necessary for a foreign language learning process.

[0084] Speech recognition unit 114 includes an automatic speech segmentunit 140.2 for extracting a speech spectral envelope from speech datasupplied from speech input unit 102 and then segmenting a speech signal,a speech likelihood calculating unit 140.4 for calculating a speechlikelihood for identifying phonemes of unit language sound, asentence/word/phoneme separation unit 140.1 according to the result ofcalculation by speech likelihood calculating unit 140.4 for separating asentence and thus extracting a phoneme or a word from the sentence, anda speech recognition unit 140.3 according to the result of separation bysentence/word/phoneme separation unit 140.1 for recognizing a sentencespeech based on syntactic parsing or the like.

[0085] Data storage unit 118 includes a sentence database 118.6 holdingsentence data to be presented to a learner, a word database 118.5 forwords constituting the sentence data, and a phoneme database 118.4holding data regarding phonemes included in word database 118.5.

[0086] Data storage unit 118 further includes a learner learning historydata holding unit 118.1 for holding learning history of the learner, ateacher speech file 118.2 for holding teacher speech pronounced by anative speaker corresponding to the data stored in sentence database118.6, and a teacher speech likelihood database for holding likelihooddata calculated by speech recognition unit 114 for speech in the teacherspeech file.

[0087]FIG. 9 is a flowchart illustrating a process of foreign languagelearning by means of foreign language learning device 200 shown in FIG.8.

[0088] Referring to FIG. 9, foreign language learning device 1 startsits process (step S200), and then a model sentence indicated on CRTdisplay 120 is presented to a learner according to sentence data held insentence database 118.6 (step S202).

[0089] The learner then reads out the presented model sentence, andspeech information corresponding to the model sentence read aloud by thelearner is acquired via speech input unit 102 (step S204).

[0090] Then, automatic speech segment unit 140.2 andsentence/word/phoneme separation unit 140.1 operate to recognize speechinformation corresponding to the sentence as speech information on thebasis of phonemes (step S206).

[0091] Speech recognition unit 140.3 recognizes speech information onthe basis of words by comparing the speech information on the acquiredphonemes with model phonemes according to the data held in phonemedatabase 118.4 (step S208).

[0092] According to thus recognized speech information, MPU 116calculates a rating for each a word based on the likelihood informationcalculated by speech likelihood calculating unit 140.4 and data held inteacher speech likelihood database 118.3, and the result of calculationis presented to the learner via CRT display 120 (step S210).

[0093] Then, the learner practices pronunciation word by word or phonemeby phoneme (step S212).

[0094] Then, the learner is asked a question via CRT display 120 aboutwhether or not the learner makes a practice for another model sentence.When the learner selects practice of another model sentence viakeyboard/mouse 122, the process returns to step S202. When the learnerselects ending of the practice, the process is completed (step S216).

[0095]FIG. 10 is a flowchart illustrating in more detail step S210 forcalculating and displaying a rating for each word and step S212 forpractice of pronunciation word by word or phoneme by phoneme among thosesteps shown in FIG. 9.

[0096] When a score of each word is presented to the learner (stepS302), the learner selects via keyboard/mouse 122 a word for whichtraining should be done (step S304).

[0097] Accordingly, pronunciation of the word by the learner is recorded(step S306), and a score of each phoneme in the word is presented to thelearner (step S308).

[0098] The learner then does training on the basis of phonemes (stepS310), and determination is made as to whether or not the learner haspassed the training on the basis of phonemes (step S312). When thelearner has passed the phoneme training, the process proceeds to thenext step S314. Otherwise, the process returns to step S310.

[0099] When the learner has passed the phoneme training, the processproceeds to training on the basis of words (step S314).

[0100] When the word training is completed, the learner is asked aquestion about whether of not the learner does training for another wordvia CRT display 120. According to information entered by the learnerfrom keyboard/mouse 122, the process returns to step S304 when thelearner takes training of another word. Otherwise, the process proceedsto the next step S318.

[0101] When the training on the basis of words is completed, training onthe basis of sentence is done (step S318).

[0102] Then, it is determined whether or not the learner has passed thesentence training (step S320). When the learner has not passed thesentence training, the process returns again to step S302.

[0103] When it is determined that the learner has passed the sentencetraining, the process is completed (step S322).

[0104]FIG. 11 is a flowchart illustrating a learning process performedin advance with respect to a Hidden Markov Model (HMM) for speechrecognition so as to calculate a rating for a phoneme, word or sentencefor which training is done as shown in FIG. 10.

[0105] Referring to FIG. 11, the learning process starts (step S400),and then a Hidden Markov Model (HMM) is produced for vocabulary withwhich the training is done (step S402).

[0106] Then, according to pronunciation by the learner, speech with ahigh articulation is collected (step S404).

[0107] Based on the speech produced by the learner, melcepstrumcoefficient, LPC (Linear Predictive Coding) cepstrum or the like is usedto determine speech feature as numerical data (feature vectors) (stepS406).

[0108] Based on the speech feature vectors thus determined, training ofHMM coefficients of the Hidden Markov Model is done (step S408).

[0109] It is determined whether or not all speech processes are donethat are necessary for learning as described above (step S410). If not,the procedure returns to step S406. If done, the procedure is completed(step S412).

[0110]FIG. 12 is a flowchart illustrating a flow of calculating a ratingfor each phoneme in each word (step S308 in FIG. 10) according to theHidden Markov Model for which the pre-learning process has been done asshown in FIG. 11.

[0111] Referring to FIG. 12, a process of calculating a rating starts(step S500), speech is input (step S502), and then feature vectors arecalculated for each frame segment to be sampled (step S504).

[0112] Then, the Hidden Markov Model is used to perform Viterbi scoringand thus perform a matching calculation for deriving transition of anoptimum phoneme (step S506).

[0113] A phoneme transition path is then calculated for all of thepossible combinations and whether or not this calculation is completedis determined (step S108). If not, this flow returns to step S506. Ifcompleted, the flow proceeds to the next step S510.

[0114] For each effective frame resultant from segmentation by theHidden Markov Model, the average of scores for each frame is calculated(step S510).

[0115] A rating is then calculated for each phoneme for exampleaccording to the calculation as shown below.

(rating)=(score of a phoneme correctly pronounced)/(sum of scores of allcombinations of possible (probability is not 0) phonemes)×100

[0116] The rating is thus calculated and accordingly this process iscompleted (step S514).

[0117] When the learner practices pronunciation phoneme by phoneme, thelearning effect is enhanced by presenting appropriate information to thelearner as described below.

[0118]FIGS. 13 and 14 show information thus presented when “L” ispronounced, the information presented by means of a shape of the vocaltract (resonance cavity of sound extending from the glottis to lips).

[0119]FIGS. 15 and 16 show exemplary computer graphics presenting ashape of the resonance cavity when “R” is pronounced.

[0120] A sound with each phoneme feature is produced by the shape of thevocal tract as described above. However, in usual, the learner cannotsee such a shape and movement of the vocal tract.

[0121] In particular, it is possible to visualize, by means ofthree-dimensional computer graphics, the shapes, relative positions,movements and the like of organs (tongue, palate and the like) in theoral cavity which are highly concerned with the phoneme features and forwhich the learner can control movements. For example, the neck part maybe made transparent to allow the learner to see and identify that part.Such a visualization makes it possible to provide the learner withknowledge about the way in which each organ should be moved when eachphoneme is pronounced.

[0122]FIG. 17 shows change in resonance frequency pattern with time(voice print) that is presented as another exemplary information to thelearner who practices phoneme pronunciation.

[0123] Referring to FIG. 17, respective voice prints of teacher speechand learner speech are compared. The learner repeats pronunciation sothat the voice print pattern of the learner approaches to that of theteacher speech.

[0124] The voice prints are presented by visualization of change insound resonance frequency pattern with time by means of a fast Fouriertransformation (FFI).

[0125] Vowels and a part of consonants ([r], [a], [w], [y] and the like)of phonemes are produced with vibration of the vocal tract and suchsounds has periodicity. The spectrum of the sound exhibits its peaks(formants) with a certain pattern. Each phoneme is characterized by thepattern of the formants. Then, for these sounds, linear predictivecoding APC) is used to estimate the peaks of the spectrum, the peaks aresuperimposed on the voice print and indicated by solid circles in FIG.17, and accordingly the phoneme feature can clearly be shown.

[0126]FIG. 18 shows a screen presented, as still another exemplaryinformation, to the learner who practices pronunciation of phonemes, thescreen showing the position of a formant.

[0127] Referring to FIG. 18, the position of the formant is confirmed inreal time to correct pronunciation. For vowels and the part ofconsonants ([r], [l], [w], [y] and the like), the formant is calculatedas described above to be presented on the screen in real time.

[0128] At this time, the relative relation of three formants (first,second and third formants) in the order from the lower one is shown,that is important in characterizing a phoneme, is shown by combining twoof the three formants in a two-dimensional manner. In FIG. 18, thesecond formant (F2) is indicated on the horizontal axis and the thirdformant (F3) is indicated on the vertical axis. The sound L distributesin the vicinity of F3=2800 Hz while the sound R distributes in thevicinity of F3=1600 Hz. The formant of sound produced by the learner isindicated by the solid circle that is understood to be in the region ofsound R on F2-F3 plane.

[0129] The learner can proceed with the learning of pronunciation ofphonemes while confirming, in real time, the shape of organs forproducing higher sounds and whether or not the shape is correct.

[0130] Although the description above is given separately for each ofthe three displayed screens as shown in FIGS. 13 to 18, the screens mayappropriately be combined to achieve a more efficient pronunciationpractice.

[0131] In addition, the model display of the vocal tract shape in FIGS.13 to 16, display of voice print in FIG. 17, and display of formant inFIG. 18 are presented on the basis of each phoneme. However, whenphonemes are successively pronounced as a word, the phonemes maysuccessively be shown on the screen.

[0132] The description above is given for the structure of the foreignlanguage learning device. However, the present invention is not limitedto this structure and may be implemented by using a recording medium onwhich recorded software for performing the foreign language learningmethod as described above and operating the software by a personalcomputer or the like having a speech input/output function.

[0133] The software for executing the foreign language learning methodas described above may not only be installed in a personal computer orthe like as a recording medium but also be installed in a personalcomputer or the like having a speech input/output function through anelectrical communication line such as the Internet.

[0134] Although the present invention has been described and illustratedin detail, it is clearly understood that the same is by way ofillustration and example only and is not to be taken by way oflimitation, the spirit and scope of the present invention being limitedonly by the terms of the appended claims.

1. (amended) A foreign language learning device comprising: wordseparation means (114) for receiving sentence speech information, thesentence speech information corresponding to speech producedsuccessively by a learner when the learner utters a sentence including aplurality of words, to separate said sentence speech information intoword speech information on the basis of each word included in saidsentence; likelihood determination means (116) for evaluating degree ofmatching of each said word speech information with a model speech; anddisplay output means (120) for displaying, for each said word, aresultant evaluation determined by said likelihood determination means.2. (amended) The foreign language learning device according to claim 1,further comprising storage means (118) for storing a model sentence tobe pronounced by said learner and model phoneme array information whichcorresponds to said model sentence and concerns the whole of said modelsentence, wherein said display output means presents said model sentenceto said learner in advance, and said word separation means includesphoneme recognition means for recognizing said sentence speechinformation on the basis of each phoneme information, and word speechrecognition means for recognizing said word speech information for eachsaid word according to said phoneme information and said model phonemearray information after the separation.
 3. The foreign language learningdevice according to claim 2, wherein said phoneme recognition meansincludes phoneme likelihood determination means for determininglikelihood of each phoneme information in said sentence speechinformation, with respect to each of phonemes that can be included insaid foreign language, and said likelihood determination means evaluatesthe degree of matching of each said word speech information bycomparing, on a likelihood distribution plane of phoneme information insaid sentence speech information, each word likelihood determined alonga path followed when pronunciation follows a phoneme array exactly thesame as said model phoneme array information with the sum of wordlikelihoods determined along mistakenly utterable candidate paths from aspeech waveform of pronunciation by the learner.
 4. (amended) A foreignlanguage learning method comprising the steps of: receiving sentencespeech information, the sentence speech information corresponding tospeech produced successively by a learner when the learner utters asentence including a plurality of words, and accordingly separating saidsentence speech information into word speech information on the basis ofeach word included in said sentence (S106, S108); evaluating degree ofmatching of each said word speech information with a model speech(S110); and. displaying, for each said word, a resultant evaluation ofeach said word speech information (S110).
 5. (amended) The foreignlanguage learning method according to claim 4, further comprising thestep of presenting a model sentence to said learner in advance (S102),wherein said step of separating said sentence speech information intosaid word speech information includes the steps of recognizing saidsentence speech information on the basis of each phoneme information(S106), and recognizing said word speech information for each said wordaccording to model phoneme array information which corresponds to themodel sentence presented to said learner and concerns the whole of saidmodel sentence and according to said phoneme information after theseparation (S108).
 6. The foreign language learning method according toclaim 5, wherein said step of recognizing said sentence speechinformation on the basis of each phoneme information includes the stepof determining likelihood of each phoneme information in said sentencespeech information, with respect to each of phonemes that can beincluded in said foreign language, and in said step of evaluating thedegree of matching with the model speech, the degree of matching foreach said word is evaluated by comparing, on a likelihood distributionplane of phoneme information in said sentence speech information, eachword likelihood determined along a path followed when pronunciationfollows a phoneme array exactly the same as said model phoneme arrayinformation with the sum of word likelihoods determined along mistakenlyutterable candidate paths from a speech waveform of pronunciation by thelearner.
 7. The foreign language learning method according to claim 5,further comprising the step of evaluating a resultant pronunciation bysaid learner after practice of the pronunciation, said evaluation madeon the basis of each said phoneme and said word in said model sentenceuttered by said learner.
 8. The foreign language learning methodaccording to claim 7, wherein said step of evaluating a resultantpronunciation after practice thereof includes the step of displaying avocal tract shape model for each said phoneme via a display unit to saidlearner.
 9. The foreign language learning method according to claim 7,wherein said step of evaluating a resultant pronunciation after practicethereof includes the step of displaying, via a display unit to saidlearner, a model voice print and a voice print concerning pronunciationby said learner, said voice prints being compared with each other to bedisplayed.
 10. The foreign language learning method according to claim7, wherein said step of evaluating a resultant pronunciation afterpractice thereof includes the step of displaying, via a display unit tosaid learner, position of pronunciation by said learner on a formantplane.
 11. A foreign language learning device comprising: storage means(118) for storing a model sentence to be pronounced by a learner andmodel phoneme array information corresponding to said model sentence;display output means (104, 120) for presenting said model sentence tosaid learner in advance; word separation means (140.1) for receivingsentence speech information corresponding to a sentence pronounced bysaid learner to separate the sentence speech information into wordspeech information on the basis of each word included in said sentence;likelihood determination means (140.4) for evaluating degree of matchingof each said word speech information with a model speech; and displayoutput means (120) for displaying, for each phoneme and each said word,a resultant evaluation by said likelihood determination means, said wordseparation means including phoneme recognition means for recognizingsaid sentence speech information on the basis of each phonemeinformation, and word speech recognition means for recognizing said wordspeech information for each said word according to said phonemeinformation and said model phoneme array information after theseparation, and said foreign language learning device further comprisingpronunciation evaluation means for evaluating a resultant pronunciationafter practice of the pronunciation for each said phoneme and for eachsaid word in said model sentence uttered by said learner in apronunciation practice period.
 12. The foreign language learning deviceaccording to claim 11, wherein said pronunciation evaluation meansdisplays a vocal tract shape model for each said phoneme via a displayunit to said learner.
 13. The foreign language learning device accordingto claim 11, wherein said pronunciation evaluation means displays, via adisplay unit to said learner, a model voice print and a voice printconcerning pronunciation by said learner, said voice prints beingcompared with each other to be displayed.
 14. The foreign languagelearning device according to claim 11, wherein said pronunciationevaluation means displays, via a display unit to said learner, positionof pronunciation by said learner on a formant plane.
 15. (amended) Acomputer-readable medium recorded thereon a program for executing aforeign language learning method by a computer, said foreign languagelearning method comprising the steps of: receiving sentence speechinformation, the sentence speech information corresponding to speechproduced successively by a learner when the learner utters a sentenceincluding a plurality of words, and accordingly separating said sentencespeech information into word speech information on the basis of eachword included in said sentence; evaluating degree of matching of eachsaid word speech information with a model speech; and displaying, foreach said word, a resultant evaluation of each said word speechinformation.
 16. (amended) The computer-readable medium according toclaim 15, wherein said foreign language learning method furthercomprising the step of presenting a model sentence to said learner inadvance, wherein said step of separating said sentence speechinformation into said word speech information includes the steps ofrecognizing said sentence speech information on the basis of eachphoneme information, and recognizing said word speech information foreach said word according to a model phoneme array information whichcorresponds to the model sentence presented to said learner and concernsthe whole of said model sentence and according to said phonemeinformation after the separation.
 17. The computer-readable mediumaccording to claim 16, wherein said step of recognizing said sentencespeech information on the basis of each phoneme information includes thestep of determining likelihood of each phoneme information in saidsentence speech information, with respect to each of phonemes that canbe included in said foreign language, and in said step of evaluating thedegree of matching with the model speech, the degree of matching foreach said word is evaluated by comparing, on a likelihood distributionplane of phoneme information in said sentence speech information, eachword likelihood determined along a path followed when pronunciationfollows a phoneme array exactly the same as said model phoneme arrayinformation with the sum of word likelihoods determined along mistakenlyutterable candidate paths from a speech waveform of pronunciation by thelearner.
 18. The computer-readable medium according to claim 16, whereinsaid foreign language learning method further comprises the step ofevaluating a resultant pronunciation by said learner after practice ofthe pronunciation, said evaluation made on the basis of each saidphoneme and said word in said model sentence uttered by said learner.19. The computer-readable medium according to claim 18, wherein saidstep of evaluating a resultant pronunciation after practice thereofincludes the step of displaying a vocal tract shape model for each saidphoneme via a display unit to said learner.
 20. The computer-readablemedium according to claim 18, wherein said step of evaluating aresultant pronunciation after practice thereof includes the step ofdisplaying, via a display unit to said learner, a model voice print anda voice print concerning pronunciation by said learner, said voiceprints being compared with each other to be displayed.
 21. Thecomputer-readable medium according to claim 18, wherein said step ofevaluating a resultant pronunciation after practice thereof includes thestep of displaying, via a display unit to said learner, position ofpronunciation by said learner on a formant plane.
 22. (amended) Acomputer program for executing a foreign language learning method by acomputer, said foreign language learning method comprising the steps of:receiving sentence speech information, the sentence speech informationcorresponding to speech produced successively by a learner when thelearner utters a sentence including a plurality of words, andaccordingly separating said sentence speech information into word speechinformation on the basis of each word included in said sentence;evaluating degree of matching of each said word speech information witha model speech; and displaying, for each said word, a resultantevaluation of each said word speech information.